The Formant-Emphasized Feature Vector for Speech Recognition in Noisy Condition
نویسندگان
چکیده
Mel-frequency cepstral coefficients are widely used as the feature for speech recognition. In MFCC extraction process, the spectrum, obtained by Fourier transform of input speech signal is divided by mel-frequency bands, and each ban energy is extracted for the each frequency band. The coefficients are extracted by the discrete cosine transform of the obtained band energy. In this paper, we calculate the output energy for each bandpass filter by taking the weighting function when applying mel-frequency scaled bandpass filter. The weighting function is Gaussian distributed function whose center is at the formant frequency. In the experiments, we can see the comparative performance with the standard MFCC in clean condition, and the better performance in worse condition using method we proposed.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملComparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-SNR car environments
This paper presents an evaluation of the use of some auditorybased distinctive features and formant cues for robust automatic speech recognition (ASR) in the presence of highly interfering car noise. Comparative experiments have indicated that combining the classical MFCCs with some auditory-based acoustic distinctive cues and either the main formant magnitudes or the formant frequencies of a s...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملMulti-Stream Front-End Processing for Robust Distributed Speech Recognition
This paper investigates a multi-stream-based front-end in Distributed Speech Recognition (DSR). It aims at improving the performance of Hidden Markov Model (HMM)-based systems by combining features based on conventional MFCCs and formant-like features to constitute a new multivariate feature vector. The approach presented in this paper constitutes an alternative to the DSR-XAFE (XAFE: eXtended ...
متن کاملStatistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کامل